
Application Data Sheet 



Application Information 

Application number:: 09/930,020 
Filing Date:: 08/14/01 
Application Type:: Regular 
Subject Matter:: Utility 
Suggested classification:: 
Suggested Group Art Unit:: 
CD-ROM or CD-R??:: 
Number of CD disks- 
Number of copies of CDs:: 
Sequence Submission:: 
Computer Readable Form (CRF)?:: 
Number of copies of CRF:: 
Title:: 



Attorney Docket Number:: 
Request for Early Publication:: 
Request for Non-Publication- 
Suggested Drawing Figure:: 
Total Drawing Sheets:: 
Small Entity?:: 
Latin name:: 

Variety denomination name- 
Petition included?:: No 
Petition Type- 
Licensed US Govt. Agency :: 
Contract or Grant Numbers One:: 
Secrecy Order in Parent Appl.:: No 



Methods of Diagnosis of Colorectal Cancer, 
Compositions and Methods of Screening for 
Colorectal Cancer Modulators 
01 8501 -0031 00US 
No 
No 

0 

Yes 



Page 1 



Initial 10/16/01 



Applicant Information 

Applicant. Authority Type:: 
Primary Citizenship Country:: 
Status:: 
Given Name- 
Middle Name:: 
Family Name:: 
Name Suffix- 
City of Residence:: 
State or Province of Residence: : 
Country of Residence:: 
Street of Mailing Address- 
City of Mailing Address:: 
State or Province of mailing address- 
Country of mailing address:: 



Inventor 
US 

Full Capacity 

Kurt 

C. 

Gfsh 

San Francisco 

CA 

US 

40 Perego Terrace, No. 2 

San Francisco 

CA 

US 



Postal or Zip Code of mailing address:: 94131 



Applicant Authority Type- 
Primary Citizenship Country- 
Status:: 
Given Name- 
Middle Name:: 
Family Name:: 
Name Suffix- 
City of Residence- 
State or Province of Residence- 
Country of Residence:: 
Street of Mailing Address:: 
City of Mailing Address- 
State or Province of mailing address- 
Country of mailing address:: 
Postal or Zip Code of mailing address:: 



Inventor 
US 

Full Capacity 

David 

H. 

Mack 

Menlo Park 

CA 

US 

2076 Monterey Avenue 

Menlo Park 

CA 

US 

94025 



Page 2 



Initial 10/16/01 



• 



Applicant Authority - Type:: Inventor 

Primary Citizenship Country:: US 

Status:: Full Capacity 

Given Name:: Keith 

Middle Name:: E. 

Family Name:: Wilson 
Name Suffix:: 

City of Residence:: Redwood City 

State or Province of Residence:: CA 

Country of Residence:: US 

Street of Mailing Address:: 219 Jeter Street 

City of Mailing Address:: Redwood City 

State or Province of mailing address:: CA 

Country of mailing address:: US 



Postal or Zip Code of mailing address:: 94062 



Correspondence Information 

Correspondence Customer Number:: 20350 

Representative Information 

Representative Customer Number:: 2035G 

Domestic Priority Information 

Application:: Continuity Type:: Parent Application:: Parent Filing Date:: 

09/633,733 CIP utility September 15, 2000 

Foreign Priority Information 

Country:: Application number:: Filing Date:: 



Page 3 



Initial 10/16/01 



Assignee Information 

Assignee Name:: 

Street of mailing address:: 

City of mailing address:: 

State or Province of mailing address:: 

Country of mailing address:: 

Postal or Zip Code of mailing address:: 



Eos Biotechnology, Inc. 

225A Gateway Boulevard 

South San Francisco 

California 

USA 

94080 



Page 4 



Initial 10/16/01 



UNITED STATES PATENT AND TRADEMARK OFFICE 
DOCUMENT CLASSIFICATION BARCODE SHEET 





reliminar 



clme 



■ ■■■■■ 



ii 



fiH 





*t§§ 

m 



m 



Level - 2 
Version 1.1 
Updated - 8/01/01 
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States Postal Service as first class mail in an envelope addressed to: Box 
Missing Parts, Assistant Commissioner for Patents, Washington, D.C. 
20231 



PATENT 

Attorney Docket No.: 0 18501 -003 100US 
Client Ref. No.: COCA 007-1 



On November 9. 2001 



gj TOWNSEND and TOWNS3ND and CREW LLP 

Bv: Q^g-fli^ 

Jill W Clarke 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re application of: 

GISH et al. 

Application No.: 09/930,020 

Filed: August 14,2001 

For: METHODS OF DIAGNOSIS OF 
COLORECTAL CANCER, 
COMPOSITIONS AND METHODS OF 
SCREENING FOR COLORECTAL 
CANCER MODULATORS 



Examiner: Not yet assigned 
Art Unit: 1642 

COMMUNICATION UNDER 
37 C.F.R. 68 1.821-1.825 
AND 

PRELIMINARY AMENDMENT 



Box SEQUENCE 

Assistant Commissioner for Patents 

Washington, D.C. 20231 

Sir: 

In response to the request to comply with Requirements for Patent 
Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures, 
37 C.F.R. §§ 1.821-1.825, that accompanied the Notice to File Missing Parts of 
Nonprovisional Application mailed September 13, 2001, Applicants submit herewith the 
required paper copy and computer readable copy of the Sequence Listing. Please amend 
the specification as follows. 



# 
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In the Specification: 

Please replace paragraph [45] beginning at page 1 3, line 9 with the following; 

—[45] The extracellular domains of transmembrane proteins are diverse; 
however, conserved motifs are found repeatedly among various extracellular domains. 
Conserved structure and/or functions have been ascribed to different extracellular motifs. 
For example, cytokine receptors are characterized by a cluster of cysteines and a 
WSXWS (SEQ ID NO:3) (W= tryptophan, S= serine, X=any amino acid) motif. 
Immunoglobulin-like domains are highly conserved. Mucin-like domains may be 
involved in cell adhesion and leucine-rich repeats participate in protein-protein 
interactions.— 
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PATENT 



Please replace the paragraph (TABLE 2) beginning at page 96, line 5 with the following: 



-TABLE 2 CBF9 DNA and Protein Sequences 



CBF9 DNA sequence (SEQ ID NO: 1) 



Gene name: 

Unigene number: 
Probeset Accession #: 
Nucleic Acid Accession #: 
Coding Sequence: 



ESTs 

Hs. 157601 
W07459 
AC005383 

328-2751 

and stop codons) 



underlined sequences correspond to start 

■orlrvn a ) 



GACAGTGTTC 

TTTTATTTGC 

CCTGGCGGTA 

AC AAACAGGT 

CCCCCTGGCC 

TCGCCGCTCT 

GTTTTCCTGT 

GAAACCATCG 

ATCATGTTTC 

CACTTTGCCA 

GCATTCCAGT 

CAGGAAGTGA 

CTTGCTCTGA 

CAGATCCTCA 

CAGCTGAAGG 

GAGCTGCATG 

GAGGATGCCA 

ACGCCAGACT 

GAGTTCGCTG 

GCACACTGTC 

AGGACCACCT 

CCAGAAGGAC 

TGTGCCCTGA 

GCGGGCACCA 

GCCGTGGTGA 

CTGGTGGCGG 

GGCATTCCCT 

CGTGGCTTCG 

CTCACTGAGT 

GAGCTGCTCC 

GGCAGCCCAA 

GAGCTGCAGG 

CTCGTCTTCA 

AGCTTTGTGA 

CTGGTGGTGT 

GCTGGGATGC 

ACCGCCCTGC 



11 

i 

GCGGCTGCAC 

AGACCTGGGC 

GTTCCTCCGA 

GTCCCACGTG 

CGAGCCGCGC 

CCTTCCGTTA 

TTTCCAGAGT 

GGAAGATTTC 

TGTTAGATGG 

TCACAGTCTG 

TCAGTTCCAC 

AGGCAAGAAT 

AATACCTTCT 

TCATCGTCAC 

AAAGGGGTGT 

CACTGGCCAG 

CCAACGGCCT 

GCAGGGTCGA 

GCAATGCCCC 

CCTTCTACAG 

GCCCAGGCCC 

TGGACGGCTA 

AGCTGAGCCT 

CTCTGGACGG 

GCGAGGACTC 

TGCCTGTGGG 

TCCGTGGTGG 

GGAGCGCCAC 

CACACTCCGA 

TGCTGGGTGT 

AGCATGTGAT 

GGAAGCTGTG 

TGTTGGACAC 

GAAGCTGTGC 

ATGGCAGCCA 

TGCGGGCCAT 

TGCACATCTA 



21 

I 

CGCTCGGAGG 

CGATGCCGCT 

CCTCAGCCGG 

GCAGCCGCGC 

CCGGGTCTGT 

TATCAACATG 

GCCCCCATCT 

AGCTGCCAGC 

GTCTAACAGC 

TGACGGTCTG 

TCCTCATCTG 

CAAGAGGATG 

GCACAGAGGG 

TGATGGGAAG 

CACTGTGTTT 

CGAGCCTAGA 

CTTCAGCACC 

GGCTCAGCCC 

ATGCTGGAGA 

CTGGAAGAGA 

CTGTGACTCG 

CCAGTGCCTC 

GGAATGCAGG 

CTTCCTGCGG 

TCGGGCCCGA 

GGAGTACCAG 

CCCCACGCTG 

CAGGACAGGC 

GGATGAGGTT 

AGGCAGTGAG 

GGTCTACTCG 

CAGCCGGCAG 

CTCTGCCTCA 

CCTCCAGTTT 

GGTGCAGACT 

TAGCGAGGGC 

TGACAAAGTG 



31 

I 

CTGGGTGACC 

TTAAAAAACG 

GTCGGGTCGT 

CCCGGGCGCC 

GAGTAGAGCC 

CCCCCTTTCC 

CTCCCTCTCC 

AAAATGATGT 

GTCGGGAAAG 

GACATCAGCC 

GAATTCCCCT 

GTTTTCAAAG 

TTGCCTGGAG 

TGCCAGGGGG 

GCTGTGGGGG 

GGGCAGCACG 

CTCAGCAGCT 

TGTGAGCACA 

GGATCGCGGC 

GTGTTCCTAA 

CAGCCCTGCC 

TGCCCGCTGG 

GTCGACCTCC 

GCCAAAGTCT 

GTGGGTGTGG 

GATGTGCCTG 

ACGGGCAGTG 

CAGGACCGGC 

GCGGGCCCAG 

GCCGTGCGGG 

GATCCTCAGG 

CGGCCAGGGT 

GTAGGGCCCG 

GAGGTGAACC 

GCCTTCGGGC 

CCCTACGTAG 

ATGACCGTCC 



41 

I 

CGCGTAGAAG 

CGAGGGGCTC 

GCCGCCCTCT 

CCTCCTGTGA 

GCCCGGGCAC 

TGTTGCTGGA 

AGGAAGTCCA 

GGTGCTCGGC 

GGAGCTTTGA 

CCGAGAGGGT 

TGGATTCATT 

GAGGGCGCAC 

GCAGAAATGC 

ATGTGGCACT 

TCAGGTTTCC 

TGCTGTTGGC 

CGGCCATCTG 

GGACGCTGGA 

GGACCCTTGC 

CCCACCCTGC 

AGAATGGAGG 

CCTTTGGAGG 

TCTTCCTGGT 

TCGTGAAGCG 

CCACATACAG 

ACCTGGTCTG 

CCTTGCGGCA 

CACGTAGAGT 

CGCGTCACGC 

CAGAGCTGGA 

ATCTGTTCAA 

GCCGGACACA 

AGAATTTTGC 

CTGACGTGAC 

TGGACAGGAA 

GTGGGGTGGG 

AGAGGGGTGC 



51 

I 

TGAAGTACTT 

TATGCACCTC 

CCCAGGAGAG 

TCCCGTAGCG 

CGAGCGCTGG 

GGCCGTCTGT 

TGTAAGCAAA 

TGCAGTGGAC 

AAGGTCCAAG 

CAGAGTGGGA 

TTCAACCCAA 

GGAGACGGAA 

TTCTGTGCCC 

GCCATCCAAG 

CAGGTGGGAG 

TGAGCAGGTG 

CTCCAGCGCC 

GATGGTCCGG 

GGTGCTGGCT 

CACCTGCTAC 

CACATGTGTT 

GGAGGCTAAC 

GGACAGCTCT 

GTTTGTGCGG 

GAGGGAGCTG 

GAGCCTCGAT 

GGCGG CAGAG 

GGTGGTTTTG 

AAGGGCGCGA 

GGAGATCACA 

CCAAATCCCT 

AGCCCTGGAC 

TCAGATGCAG 

ACAGGTCGGC 

ACCCACCCGG 

CTCAGCCGGC 

CCGGCCTGGT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
132 0 
1380 
144 0 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
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GTCCCCAAAG CTGTGGTGGT GCTCACAGGC GGGAGAGGCG 
GCCCAGAAGC TGAGGAACAA TGGCATCTCT GTCTTGGTCG 
AGTGAGGGTC TGCGGAGGCT TGCAGGTCCC CGGGATTCCC 
GCCGACCTGC GGTACCACCA GGACGTGCTC ATTGAGTGGC 
CCAGTCAACC TCTGCAAACC CAGCCCGTGC ATGAATGAGG 
GGGAGCTACC GCTGCAAGTG TCGGGATGGC TGGGAGGGCC 
TGGAGCTCTT GCTCTGTATG TGTGAGCCAG GGATGGATTC 
ATGGCTGCCG TGCAGGAGGG CAGCAGCCGT ACCCCTGCCA 
GGCACTGAAA TGGTGCCTAC CTTCTGGAAT GTCTGTGCCC 
TTCCCGCCGT GGCCAGGACC ACT AT T CTC A CTGAGGGAGG 
ATGCTGCTTA GAGACAAGAA AGCAGCTGAT GTCACCCACA 
TTGATGTGTA AGTAAATACC CACTTTCTGT ACCTGCTGTG 
CTGCCACCTT TCCCTTGAGG ATAAACAAGG GGTCCTGAAG 
CGTTCCTTTG CACACAATCA ATGCTCGCCA GAATGTTGTT 
AGGCCTTTAC TAGAGCATCC TTTGGACGGC GAAGGCCACG 
GCAGCTTTTC CACTTCCCCA GAGACATTCT GGATGCATTT 
CTTGAGGGAC GTTTGTGACT TCTTGGCGAC TGCCTTTTGT 
GGTCTCAGAC TGAATGTGAC CAATTAACCA GCTTGGTTGA 
TGTGCATGGG CCCAGGTCTG GAGGGCCACG TAAAATCGTT 
ACCTTGAAGG TCTTC 



PATE NT 



P & a A pp TS. TPP 


AGL.CG x X T 


228 0 


X VJO^jlwkj 1 GGG 


GLCIGILCIA 


2340 


"Vfi a t>pp a pp t 


Wj CAGCTTAC 


2400 


X G 1 G 1 GGAGA 


AG CCAAGCAG 


2460 


VjC/iljL. J. GCG X 


tti GLAGAAT 


2520 


t\-tAL I GL-Gii 


GAAL Uj I GAG 


2580 




C CTGAGGCAC 


2640 


GLAALIALAG 


AGAAGGCCTG 


2700 




GAATGTCTGC 


2760 


AGGATGTCCC 


AACTGCAGCC 


2820 


AACGATGTTG 


TTGAAAAGTT 


2880 


CGTTGTTGAG 


GCTATGTCAT 


2940 


ACTTAAATTT 


AGCGGCCTGA 


3000 


GACACAGTAA 


TGCCCAGCAG 


3060 


GCCTTTCAAG 


ATGGAAAGCA 


3120 


GCATTGAGTC 


TGAAAGGGGG 


3180 


GTGTGGAAGA 


GACTTGGAAA 


3240 


TGATGGGGGA 


GGGGCTGAGT 


3300 


CTGAGTCGTG 


AGCAGTGTCC 


3360 



CBF9 Protein sequence (SEQ ID N0:2) 



Gene name : 
Unigene number; 

Signal sequence : 
Transmembrane doma ins : 
VGW domains : 
EGF domains: 
Cellular Localization: 



ESTs 

Hs. 157601 



Protein Accession #: none found 



1-17 

none found 

49-223; 341-518; 529-706 
298-333; 715-748 
plasma membrane 



MP PFLLLE AV 
SVGKGSFERS 
MVFKGGRTET 
FAVGVRFPRW 
PCEHRTLEMV 
SQPGQNGGTC 
RAKVFVKRFV 
LTGSALRQAA 
EAVRAELEEI 
SVGPENFAQM 
APYLGGVGSA 
SVLWGVGPV 
CMNEGSCVLQ 
RTPPSNYREG 



11 

CVFLFSRVPP 
KHFAITVCDG 
ELALKYLLHR 
EELHALASEP 
REFAGNAPCW 
VPEGLDGYQC 
RAVLSEDSRA 
ERGFGSATRT 
TGSPKHVMVY 
QSFVRSCALQ 
GTALLHIYDK 
LSEGLRRLAG 
NGSYRCKCRD 
LGTEMVPTFW 



21 

I 

SLPLQEVHVS 
LDISPERVRV 
GLPGGRNASV 
RGQHVLLAEQ 
RGSRRTLAVL 
LCPLAFGGEA 
RVGVATYSRE 
GQDRPRRVW 
SDPQDLFNQI 
FEVNPDVTQV 
VMTVQRGARP 
PRDSLIHVAA 
GWEGPHCENR 
NVCAPGP ; 



31 

[.■ 

KETIGKISAA 
GAFQFSSTPH 
PQILIIVTDG 
VEDATNGLFS 
AAHCPFYSWK 
NCALKLSLEC 
LLVAVPVGEY 
LLTESHSEDE 
PELQGKLCSR 
GLWYGSQVQ 
GVPKAWVLT 
YADLRYHQDV 
EWSSCSVCVS 



41 

S KMMWCS AAV 
LEFPLDSFST 
KSQGDVALPS 
TLSSSAICSS 
RVFLTHPATC 
RVDLLFLLDS 
QDVPDLVWSL 
VAGPARHARA 
QRPGCRTQAL 
TAFGLDTKPT 
GGRGAEDAAV 
LIEWLCGEAK 
QGWILETPLR 



51 



DIMFLIiDGSN 


60 


QQEVKARIKR 


120 


KQLKERGVTV 


180 


ATPDCRVEAH 


240 


YRTTCPGPCD 


300 


SAGTTLDGFL 


360 


DGIPFRGGPT 


420 


RELLLLGVGS 


4 80 


DLVFMLDTSA 


540 


RAAMLRAISQ 


600 


PAQKLRNNGI 


660 


QPVNLCKPSP 


720 


HMAPVQEGSS 


780 



• 



• 
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Please insert the accompanying paper copy of the Sequence Listing, page numbers 1 to 4, 
at the end of the application. 

REMARKS 

Applicants request entry of this amendment in adherence with 37 C.F.R. 
§§1 .821 to 1 .825. This amendment is accompanied by a floppy disk containing the above 
named sequences, SEQ ID NOS:l-3, in computer readable form, and a paper copy of the 
sequence information which has been printed from the floppy disk. 

The information contained in the computer readable disk was prepared 
through the use of the software program "Patentln" and is identical to that of the paper 
copy. This amendment contains no new matter. 

Attached hereto is a marked-up version of the changes made to the 
Specification and Abstract by the current Amendment. The attached pages are captioned 
"VERSION WITH MARKINGS TO SHOW CHANGES MADE." 

If the Examiner believes a telephone conference would expedite 
prosecution of this application, please telephone the undersigned at 415-576-0200. 



TOWNSEND and TOWNSEND and CREW LLP 

Two Embarcadero Center, 8 th Floor 

San Francisco, California 94111-3834 

Tel: (415) 576-0200 

Fax:(415)576-0300 

KLB:dmw 
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VERSION WITH MARKINGS TO SHOW CHANGES MADE 
In the Specification: 

Paragraph [45] beginning at line 23 of page 6 has been amended as follows: 



[45] The extracellular domains of transmembrane proteins are diverse; 



however, conserved motifs are found repeatedly among various extracellular domains. 
Conserved structure and/or functions have been ascribed to different extracellular motifs. 
For example, cytokine receptors are characterized by a cluster of cysteines and a 
WSXWS fSEOIDNO:3^ (W= tryptophan, S= serine, X=any amino acid) motif. 
Immunoglobulin-like domains are highly conserved. Mucin-Iike domains may be 
involved in cell adhesion and leucine-rich repeats participate in protein-protein 
interactions. 
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Paragraph (TABLE 2) beginning at line 5 of page 96 has been amended as follows: 



TABLE 2 CBF9 DNA and Protein Sequences 



CBF9 DNA sequence (SEP ID NO:D 



Gene name: 

Unigene number: 
Probeset Accession #: 
Nucleic Acid Accession 
Coding Sequence: 



#: 



ESTs 

Hs. 157601 

W07459 

AC005383 

328-2751 (underlined sequences correspond to start 
and stop codons) 



1 11 21 31 41 51 

I I i I I 1 

GACAGTGTTC GCGGCTGCAC CGCTCGGAGG CTGGGTGACC CGCGTAGAAG TGAAGTACTT 60 

TTTTATTTGC AGACCTGGGC CGATGCCGCT TTAAAAAACG CGAGGGGCTC TATGCACCTC 120 

CCTGGCGGTA GTTCCTCCGA CCTCAGCCGG GTCGGGTCGT GCCGCCCTCT CCCAGGAGAG 180 

ACAAACAGGT GTCCCACGTG GCAGCCGCGC CCCGGGCGCC CCTCCTGTGA TCCCGTAGCG 240 

CCCCCTGGCC CGAGCCGCGC CCGGGTCTGT GAGTAGAGGG GCCCGGGCAC CGAGCGCTGG 300 

TCGCCGCTCT CCTTCCGTTA TATCAACATG CCCCCTTTCC TGTTGCTGGA GGCCGTCTGT 360 

GTTTTCCTGT TTTCCAGAGT GCCCCCATCT CTCCCTCTCC AGGAAGTCCA TGTAAGCAAA 420 

GAAACCATCG GGAAGATTTC AGCTGCCAGC AAAATGATGT GGTGCTCGGC TGCAGTGGAC 480 

ATCATGTTTC TGTTAGATGG GTCTAACAGC GTCGGGAAAG GGAGCTTTGA AAGGTCCAAG 540 

CACTTTGCCA TCACAGTCTG TGACGGTCTG GACATCAGCC CCGAGAGGGT CAGAGTGGGA 600 

GCATTCCAGT TGAGTTCCAC TCCTCATCTG GAATTCCCCT TGGATTCATT TTCAACCCAA 660 

CAGGAAGTGA AGGCAAGAAT CAAGAGGATG GTTTTCAAAG GAGGGCGCAC GGAGACGGAA 720 

CTTGCTCTGA AATACCTTCT GCACAGAGGG TTGCCTGGAG GCAGAAATGC TTCTGTGCCC 780 

CAGATCCTCA TCAT CGTCAC TGATGGGAAG TCCCAGGGGG ATGTGGCACT GCCATCCAAG 840 

CAGCTGAAGG AAAGGGGTGT CACTGTGTTT GCTGTGGGGG TCAGGTTTCC CAGGTGGGAG 900 

GAGCTGCATG CACTGGCCAG CGAGCCTAGA GGGCAGCACG TGCTGTTGGC TGAGCAGGTG 960 

GAGGATGCCA CCAACGGCCT CTTCAGCACC CTCAGCAGCT CGGCCATCTG CTCCAGCGCC 1020 

AGGCCAGACT GCAGGGTCGA GGCTCAGCCC TGTGAGCACA GGACGCTGGA GATGGTCCGG 1080 

GAGTTCGCTG GCAATGCCCC ATGCTGGAGA GGATCGCGGC GGACCCTTGC GGTGCTGGCT 1140 

GCACACTGTC CCTTCTACAG CTGGAAGAGA GTGTTCCTAA CCCACCCTGC CACCTGCTAC 1200 

AGGACCACCT GCCCAGGCCC CTGTGACTCG CAGCCCTGCC AGAATGGAGG CACATGTGTT 1260 

CCAGAAGGAC TGGACGGCTA CCAGTGCCTC TGCCCGCTGG CCTTTGGAGG GGAGGCTAAC 1320 

TGTGCCCTGA AGCTGAGCCT GGAATGCAGG GTCGACCTCC TCTTCCTGCT GGACAGCTCT 1380 

GCGGGCACCA CTCTGGACGG CTTCCTGCGG GCCAAAGTCT TCGTGAAGCG GTTTGTGCGG 1440 

GCCGTGCTGA GCGAGGACTC TCGGGCCCGA GTGGGTGTGG CCACATACAG CAGGGAGCTG 1500 

CTGGTGGGGG TGCCTGTGGG GGAGTACCAG GATGTGCCTG ACCTGGTCTG GAGCCTCGAT 1560 

GGCATTCCCT TCCGTGGTGG CCCCACCCTG ACGGGCAGTG CGTTGCGGCA GGCGGCAGAG 1620 

CGTGGCTTCG GGAGCGCCAC CAGGACAGGC CAGGACCGGC CACGTAGAGT GGTGGTTTTG 1680 

CTCACTGAGT CACACTCCGA GGATGAGGTT GCGGGCCCAG CGCGTGACGC AAGGGCGCGA 1740 

GAGCTGCTCC TGCTGGGTGT AGGCAGTGAG GCCGTGCGGG CAGAGCTGGA GGAGATCACA 1800 

GGCAGCCCAA AGCATGTGAT GGTCTACTCG GATCCTCAGG ATCTGTTCAA CCAAATCCCT 1860 

GAGCTGCAGG GGAAGCTGTG CAGGCGGCAG CGGCCAGGGT GCCGGACACA AGCCCTGGAC 1920 

CTCGTCTTCA TGTTGGACAC CTCTGCCTCA GTAGGGCCCG AGAATTTTGC TCAGATGCAG 1980 

AGCTTTGTGA GAAGCTGTGC CCTCCAGTTT GAGGTGAACC CTGACGTGAC ACAGGTCGGC 2040 

CTGGTGGTGT ATGGCAGCCA GGTGCAGACT GCCTTCGGGC TGGACACCAA ACCCACCCGG 2100 

GCTGCGATGC TGCGGGCCAT TAGCCAGGCC CCCTACCTAG GTGGGGTGGG CTCAGCCGGC 2160 

ACCGCCCTGC TGCACATCTA TGACAAAGTG ATGACCGTCC AGAGGGGTGC CCGGCCTGGT 2220 
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GTCCCCAAAG CTGTGGTGGT GCTCACAGGC GGGAGAGGCG CAGAGGATGC AGCCGTTCCT 2280 

GCCCAGAAGC TGAGGAACAA TGGCATCTCT GTCTTGGTCG TGGGCGTGGG GCCTGTCCTA 234 0 

AGTGAGGGTC TGCGGAGGCT TGCAGGTCCC CGGGATTCCC TGATCCACGT GGCAGCTTAC 2400 

GCCGACCTGC GGTACCACCA GGACGTGCTC ATTGAGTGGC TGTGTGGAGA AGCCAAGCAG 2460 

CCAGTCAACC TCTGCAAACC CAGCCCGTGC ATGAATGAGG GCAGCTGCGT CCTGCAGAAT 2520 

GGGAGCTACC GCTGCAAGTG . TCGGGATGGC TGGGAGGGCC CCCACTGCGA GAACCGTGAG 2580 

TGGAGCTCTT GCTCTGTATG TGTGAGCCAG GGATGGATTC . TTGAGACGCC CCTGAGGCAC 264 0 

ATGGCTCCCG TGCAGGAGGG CAGCAGCCGT ACCCCTCCCA GCAACTACAG AGAAGGCCTG 2700 

GGCACTGAAA TGGTGCCTAC CTTCTGGAAT GTCTGTGCCC CAGGTCCT TA GA ATGTCTGC 2760 

TTCCCGCCGT GGCCAGGACC ACTATTCTCA CTGAGGGAGG AGGATGTCCC AACTGCAGCC 2820 

ATGCTGCTTA GAGACAAGAA AGCAGCTGAT GTCACCCACA AACGATGTTG TTGAAAAGTT 2880 

TTGATGTGTA AGTAAATACC CACTTTCTGT ACCTGCTGTG CCTTGTTGAG GCTATGTCAT 2940 

CTGCCACCTT TCCCTTGAGG ATAAACAAGG GGTCCTGAAG ACTTAAATTT AGCGGCCTGA 3000 

CGTTCCTTTG CACACAATCA ATGCTCGCCA GAATGTTGTT GACACAGTAA TGCCCAGCAG 3 060 

AGGCCTTTAC TAGAGCATCC TTTGGACGGC GAAGGCCACG GCCTTTCAAG ATGGAAAGCA 3120 

GCAGCTTTTC CACTTCCCCA GAGACATTCT GGATGCATTT GCATTGAGTC TGAAAGGGGG 3180 

CTTGAGGGAC GTTTGTGACT TCTTGGCGAC TGCCTTTTGT GTGTGGAAGA GACTTGGAAA 3240 

GGTCTCAGAC TGAATGTGAC CAATTAACCA GCTTGGTTGA TGATGGGGGA GGGGCTGAGT 3300 

TGTGCATGGG CCCAGGTCTG GAGGGCCACG TAAAATCGTT CTGAGTCGTG AGCAGTGTCC 3360 
ACCTTGAAGG TCTTC 



CBF9 Protein sequence (SEP ID NO:2^ 



Gene name: 
Unigene number: 

Signal sequence: 
Transmembrane domains : 
VGW domains: 
EGF domains : 
Cellular Localization: 



ESTs 

HS. 157601 

Protein Accession #: none found 

1-17 

none found 

49-223; 341-518; 529-706 
298-333; 715-748 
plasma membrane 



MPPFLLLEAV 
SVGKGSFERS 
MVFKGGRTET 
FAVGVRFPRW 
PCEHRTLEMV 
SQPCQNGGTC 
RAKVFVKRFV 
LTGSALRQAA 
EAVRAELEEI 
SVGPENFAQM 
APYLGGVGSA 
SVLWGVGPV 
CMNEGSCVLQ 
RTPPSNYREG 



11 
I 

CVFLFSRVPP 
KHFAITVCDG 
ELALKYLLHR 
EELHALASEP 
REFAGNAPCW 
VPEGLDGYQC 
RAVLSEDSRA 
ERGFGSATRT 
TGSPKHVMVY 
QSFVRSCALQ 
GTALLHIYDK 
LSEGLRRLAG 
NGSYRCKCRD 
LGTEMVPTFW 



21 
I 

SLPLQEVHVS 
LDISPERVRV 
GLPGGRNASV 
RGQHVLLAEQ 
RGSRRTLAVL 
LCPLAFGGEA 
RVGVATYSRE 
GQDRPRRWV 
SDPQDLFNQI 
FEVNPDVTQV 
VMTVQRGARP 
PRDSLIHVAA 
GWEGPHCENR 
NVCAPGP 



31 
I 

KETIGKISAA 
GAFQFSSTPH 
PQILIIVTDG 
VEDATNGLFS 
AAHCPFYSWK 
NCALKLSLEC 
LLVAVPVGEY 
LLTESHSEDE 
PELQGKLCSR 
GLWYGSQVQ 
GVPKAVWLT 
YADLRYHQDV 
EWSSCSVCVS 



41 

I 

SKMMWCSAAV 
LiEFPLDSFST 
KSQGDVALPS 
TLSSSAICSS 
RVFLTHPATC 
RVDLLFLLDS 
ODVPDLVWSL 
VAGPARHARA 
QRPGCRTQAL 
TAFGLDTKPT 
GGRGAEDAAV 
LIEWLCGEAK 
QGWILETPLR 



51 
I 

DIMFLLDGSN 
QQEVKARIKR 
KQLKERGVTV 
ATPDCRVEAH 
YRTTCPGPCD 
SAGTTLDGFL 
DGIPFRGGPT 
RELLLLGVGS 
DLVFMLDTSA 
RAAMLRAISQ 
PAQKLRNNGI 
QPVNLCKPSP 
HMAPVQEGSS 



60 
120 
180 
240 
300 
360 
420 
4 80 
540 
600 
660 
720 
780 
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